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ABSTRACT 

For individualized or computer assisted instruction, 
norm referenced testing is inadequate to determine each individual’s 
mastery on specific kinds of tasks. Hively’s item forms and 
Ferguson* s stratified item forms, both based on observable 
characteristics of the problems, and Scandura *s algorithmic 
technology, positing that persons use rules to solve problems and 
thus that problems should be partitioned on the basis of rules needed 
to solve them, have been developed to measure individual mastery. 

This study was designed to compare their effectiveness and efficiency 
in assessing mastery of column subtraction problems. All three 
methods were essentially equal in predicting mastery of individual 
items, but the algorithmic method used far fewer items and thus was 
more efficient. The item forms technology would seem to have a slight 
advantage in the ease with which a computer could randomly generate 
test items, but even items for the algorithmic form can be computer 
generated, although slightly indirectly. (RH) 
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Recent research in individualized (e.g., Lipson, 1967) and computer 
assisted (e.g., Suppes, 1966) instruction has led to an increasing aware- 
ness of the inadequacies of norm referenced testing and the need for 
testing procedures which determine each individual’s mastery on specific 
types of tasks (e.g., Coulson & Cogswell, 1965). Knowing how well a 
student has performed relative to some peer group, for example, says 
relatively little about the kinds of decisions that must be made if instruc- 
tion is to be totally individualized. Ideally, in mastery testing the 
procedures used should 1) provide a sound basis for diagnosing individual 
strengths and weaknesses on each type of task, 2) require as few items as 
possible, and 3) provide a basis for generalizing from overall test per- 
formance to behavior on a clearly defined universe or domain of tasks. 



^This article is based on a Ph.D. dissertation submitted by the first 
author under the second author’s chairmanship to the University of Pennsyl- 
vania. This study was supported by U.S. Office of Education Grant 3-71- 
0136 and, in part, by National Science Foundation . Grant GW6796, both to 
the second author. continued 



If, in addition, items can be ordered according to difficulty to allow 
for conditional (sequential) testing, efficiency could be further 
increased. 

Fortunately, a number of new technologies have recently been 
developed for constructing tests that have the above characteristics (e.g. , 
Ferguson, 1969; Hively, Patterson & Page, 1968; Johnson, 1970; Nitko, 

1970; Osburn, 1968; Rabehl, 1970; Roudabush & Green, 1971; Scandura, 

1971a, 1972). The purpose of this study was to compare with respect to 
these characteristics three of the technologies: the item forms tech- 

nology (domain referenced testing) of Hively et al. (1968) , the hierarchi- 
cal or stratified item forms technology of Ferguson (1969) , and the algo- 
rithmic technology of Scandura (1971a, 1972) . 

In domain referenced testing, a defined universe or domain of items 
(e.g. , column subtraction problems) is subdivided into classes of items 
or item forms on the basis of observable properties the items in each class 
have in common. Osburn (1968) characterized an item form as having a fixed 
syntactical structure (e.g., -y ) , one or more elements (e.g., JL jf>, 
and explicit criteria for specifying which elements belong to the form 
(e.g., x «= x 2 ; y - y^ y 2 ; y±< x^; y 2 < x 2 ; »i, x 2 , yi» y 2 £ • • > 9 } )• 

To assess pupil performance on a given domain of problems a test is construc- 
ted by randomly selecting one item from each of the identified forms. 

It was felt by Hively et al. (1968) that item forms might be used 
not only to assess a pupil’s overall performance on the domain of problems 



The authors thank Alfonso Georeno and David Shore for their cooperation 
in providing subjects. The authors would also like to thank Frederick Davis, 
James Diamond, Zoltan Domotor and Albert Oliver for helpful comments on an 
earlier version of this paper. 
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but also to predict his behavior on specific problems in the domain. 

That is, if a subject were successful on one problem belonging to an item 
form, then he would be successful on any other problem of the same form, 
and similarly if he were unsuccessful on a problem belonging to an item 
form, he would be unsuccessful on any other problem of the same form. 

Although Hively et al. (1968) were able to obtain high coefficients of 
generalizability (Cronbach, Rajaratnam, & Gleser, 1963; Rajaratnam, Cron- 
bach, & Gleser, 1965) for tests based on the item forms technology, they 
did not find that item forms , in general, represented homogeneous categories 

of problems of the type described above. 

One criticism of the item forms technology has been that the 
hierarchical relationships among item forms have not been taken into 
account in testing (e.g. , Nitko, 1970). In a recent study by Ferguson 
(1969) these relationships were dealt with explicitly. In this study, item 

forms were generated for both terminal and prerequisite instructional 

«* 

objectives in a way analogous to task analysis (e.g., Gagne, 1962). 

Starting with a terminal item form, corresponding to a terminal instructional 
objective, sub-item forms (i.e., subobjectives) were identified which were 
considered prerequisite to the terminal Item form. The item forms so 
identified were then ordered according to the hypothesized hierarchical 
structure and a computer was programmed to make branching decisions based 
on probabilistic evaluations of student performance on each of the forms. 
Clearly, a conditional testing procedure of this sort could conceivably 
provide a highly efficient basis for assessing the behavior potential of 
individual subjects. 
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Although the technologies for assessing mastery developed by Hively 
et al. (1968) and Ferguson (1969) appear to be major steps toward improved 
mastery and diagnostic testing, they are subject to one fundamental 
criticism. There is no real theoretical basis for either technology. 

With the possible exception of Ferguson* s hierarchical ordering of forms, 
which is based essentially on task analysis, there is little basis other 
than (possible) sound intuitive judgment as to how items should be cate- 
gorized. As a result, both technologies can be criticized on a priori 
grounds. For example, the item forms identified for subtraction by 
Hively et al . , and those identified by Ferguson , both failed to partition 
the domain of subtraction problems into mutually exclusive and exhaustive 
classes (i.e., equivalence classes). This lack of partition may very well 
have contributed to Hively et al. T s finding that item forms did not repre- 
sent homogeneous classes of items. In general, it is not an easy task to 
generate item forms which will partition a domain. Also, once a set of 
item forms has been generated, it is difficult to determine whether or not 
the item forms do indeed form a partition. 

Furthermore, neither technology specifically takes into account the 
knowledge which makes it possible to solve problems belonging to a given 
domain. This is an important limitation because there can be any number 
of ways of solving problems within a domain. For example, there are 
several common rules a pupil may use to solve subtraction problems. His 
performance on such problems could be due to his mastery of any one of 
these rules. (Identifying what rules may be used on a domain of problems 
also has important implications for providing remediation, and more is 
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said on thxs below.) 

Scandura's (1971a, 1972) theory of structural learning provides a 
theoretical basis for an algorithmic technology to assessing behavior 
potential which deals directly with the above problems. This theory 
consists of three hierarchically related partial theories : a theory of 

knowledge, a memory -free theory of learning and performance, and a theory 
of memory. For present purposes two basic assumptions of the memory-free 
theory suffice. Stated simply, they are that people use rules to solve 
problems and that if an individual has learned a rule for solving a given 

problem or task, then he will use it. 

To see how these assumptions are involved, notice that if an observer 

knows what rule or rules a subject has available for solving a given 

domain of problems, then he can predict perfectly the subject's performance 

♦ 

on problems in that domain. Unfortunately, the observer generally has no 
a priori way of knowing this. Nonetheless, with many familiar tasks (e.g., 
ordinary subtraction) there is a limited number of rules that subjects 
in a given population are most likely to use (e.g., the ,, borrowing ,, and 
"equal addition" methods for subtraction), and the first step in assessing 
behavior potential is for the observer-theorist to identify them. 

It does not necessarily follow, of course, that every subject (or 
even any subject) will know any one of these rules completely* Rules 
consist of operations and branching decisions (i.e., subrules) which are 
performed in certain specified orders (see Scandura, 1970b, 1971a). 

The branching decisions of the rule serve to combine the operations in 
different ways for solving different kinds of problems. Thus a subject 
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may know part of a rule or parts of several rules and, hence, may solve 
certain tasks governed by the rule(s) but not others. The object of 
testing is to determine from a subject's performance on a limited number 
of problems what parts of the rule or rules he knows and what parts he 
does not know. 

Now the operations and branching decisions of a rule can be described 
or listed in much the same way that one constructs a computer program. 

(An alternative description is a flow chart. When discussing rules in which 
the operations and branching decisions are made explicit in either of these 
two ways, the term algorithm is used.) From the list or program one can 

see that there are a finite number of ways in which the subrules may be 

2 

combined or sequenced to solve problems. These sequences of subrules, 
called paths, partition the domain of tasks governed by an algorithm 
into equivalence classes. 

Consider, for example, the domain described by "Find sums (less than 
100) for column addition using two or more addends of one digit . An 
algorithm governing this domain may be characterized by the following pro- 
gram: 



2 

Some of the sequences involve cycles or loops in which the same 
subrules may be repeated indefinitely. Each traversal through a loop, 
of course, generates a new extended sequence of the same subrules. 
However, because no new subrules are added or deleted, these sequences 
are considered equivalent. 

O 

This description of a class of tasks was adapted from a list of 
objectives for the Individualized Prescribed Instruction Program at the 
University of Pittsburgh’s Learning Research and Development Center, 
September, 1965. 
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1* Add the top two addends. 

2. If there are no other addends, go to 3; 
otherwise go to 4. 

3. Write the sum and stop. 

4. Add the units digit of the obtained 
sum to the next addend. 

5. If the sum is greater than 10, go to 6; 
otherwise go to 7. 

6. Add 1 to whatever is in the tens place 
and return to 2. 

7. Return to 2. 
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This algorithm can be represented by a directed 
graph in which the numbered arcs correspond to subrules and 
points to branching decisions (i.e., "if" statements) as 
follows* 



START stop 



Prom the graph it can be determined that there are four 

paths (i*o«i sequences of subrules) through the algorithm. 

I . 3 



a* Path 1 $ 



is used to solve prob- 



lems having only two addends (e*g . t +ji ). 

■V 1 



b. Path 2; 



*£• , is used to solve 



problems having more than two addends but 
with intermediate sums less than ten and 



1 

the final sum less than nineteen (e.g . > j ). 

f 3 

Path 3, — ) t — — , is used to solve prob- 

n$* 

lems having more than two addends where 

successive sums increment the tens place 

( e. g. ^ ( ) • 

JZ. 



d. Path 4, 



13 

— 9 * S USG ^ s °3* ve Prob- 
lems having more than two addends where the 



successive sums may or may not increment the 



tens place (e.g,.., ^ ). 
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It is easy to see from this example, then, that paths partition the 
domain governed by an algorithm into equivalence classes. That is, two 
problems are equivalent if and only if they are solvable by the same path 
through the algorithm. 

If the constituent subrules of an algorithm are atomic (i.e., a 
subr ul e can be used by a subject on all or none of its instances) for 
any given subject, then it follows logically that the paths of the 
algorithm will also be atomic. This implies that if the subject is 
successful on any one item o: an equivalence class, then he should be 
successful on any other and similarly for failure. Hence, to assess 
his behavior potential all that is needed is one item from each equiva- 
lence class. 

As was mentioned earlier, of course, there may be more than one 
feasible algorithm underlying a domain of tasks. If several algorithms 
are identified, then it is likely that some of these algorithms will 
partition the domain differently. This slight complication can be easily 
handled, however, by forming what we shall call an intersection partition 
on the given domain of tasks. The intersection partition is formed by 
selecting one eqiivalence class from each partition and taking their 
intersection. The collection of all possible non-empty intersections 4 
formed in this way generates the intersection partition. Generally, 




4 To see in more detail how these intersections may be obtained, let A. 
represent an equivalence class associated with path £. of algorithm ^ . 
The collection of intersection sets for rv algorithms can be generated by 
taking Ajft ^ where the «A k vary over all paths 

of the algorithms. If there are **t k paths per algorithm, then there can 
be at most non-empty intersections . 

kil k IV 
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the Intersection partition is a finer partition of the domain than the 
partition associated with any one algorithm. To assess behavior potential 
simultaneously with respect to all of the identified algorithms, one 
item from each equivalence class belonging tp the intersection partition 
is randomly selected for testing. 

In order for this assessment procedure to be applicable to a given 
population of subjects, the observer must assume that he has refined the 
algorithms to a point where the subrules are atomic for most of the 
subjects. According to the theory, this is always possible in principle 
because the subrules of an algorithm may be decomposed into ever finer 
subrules. Indeed, rules can be reduced to associations (Arbib, 1969; 
Scandura, 1970a, 1970b, 1972; Suppes, 1969), which under memory -free 
conditions are necessarily atomic. Although this can always be done 
for a given population, what is gained at this level of atomicity is 
lost in testing efficiency. More test items 
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are needed, In practice, tho goal is to find some 
optimal level of refinement. 

The algorithmic technology also provides a basis for 
ordering classes of problems according to difficulty. 

Certain paths in an algorithm are superordinate to other 
paths in that they contain all°%ho atomic rules of the 

A 

subordinate path plus come of their own ( e,g. , path 4 of 
the above algorithm is superordinate to paths 1, 2, and 3). 
Since the superordinate path is more difficult (on the 
basis of having more constituent rules) than a subordinate 
path, and since the branching decisions in the superor- 
dinate path account for all performance capable by means 
of the subordinate path, it follows that if a subject can 
use the superordinate path, he should also be able to use 
the subordinate path. Hence, success on problems associated 
with a superordinate path should imply success on all 
problems associated with relativ<^ y subordinate paths. An 
example of this partial hierarchical ordering is the 
following lattice representing the ordering of paths for the 
above algorithm. 




Empirical support for the above analysis was obtained 




12 

by Scandura and Durnin (reported in Scandura, 1971a s .1972). 
In that study a variety of tasks v/oro used and the subjects 



ranged in ability from pro school to -graduate level. The 
atomic rules of an algorithm v;ore given or "built into” 
each subject and he v/ao provided an opportunity to put 
the rules together to solve problems belonging to the 
domain of the algorithm. [^The theory of structural learn- 
ing accounts for the combining of cubrulos through the 
use of higher order rules (see Scandura, 1970a) Bach 
subject V 7 D.S then tested on one item from each equivalence 
class associated with a path of the algorithm# Based on 
first test performance predictions were made concerning 
performance on individual second test items. The results 
of the study shov?od that prediction of combined success and 
failure on second test items was possible with 96% accuracy 
Furthermore , it was f*>und that in 9574 of the cases whore a 
subject v/as successful on a super ordinate path ho was also 
successful on all subordinate paths. 

To determine the accuracy of the above analyses 
under classroom conditions an exploratory study v;as 
conducted in which the atomic rules of the algorithms wore 
assumed rather than "built into” the subjects. 

B’orty four subjects in two first year highschool 
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The correlation between corresponding items was .92. 
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algebra classes were given two tests on factoring monic 
trinomials shortly after they had completed a unit on that 
topic* The tests wore devised by first identifying the 
procedure used in the text and determining those rules 
which the author of the text assumed the students knew 
( i* o* , that were atomic) and, then, constructing two sets 
of test items corresponding to each path in the procedure. 

As in the previous study first test performance was 
used to predict second test performance* The results of 
the study showed that prediction on individual second test 
items was possible with 8S% accuracy • * And in B7% of the 
cases where a subject was successful on a superordinate 
path he also was successful on all subordinate paths. 

By way of summary, it is important to notice that 
the algorithmic approach to assessing- behavior potential 

deals directly with all of the 
questions raised earlier. It provides a theoretical basis 
for categorizing classes of problems and assures that this 
categorization partitions the domain of problems into 
equivalence classes. It also provides a theoretical Tvisis 
for the hierarchical relationship between tasks and takes 
into account the different v;ays in which a domain of tasks 
may be solved, (The implication of this for task analysis, 
of course, is that there can be more than one way of hier- 
archically ordering problems within a given domain of tasks. 




6 The correlation between corresponding items v/as ,60, 
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In fact, there is a different hierarchy for each rule 
governing the domain* ) 

Granting the more rigorous theoretical foundations 
for the algorithmic technology, its pragmatic value 
relative to other existing technologies was still an open 
question* The objective of this study .was to help clarify 
this issue* Specifically, we wanted to determine whether 
or not the algorithmic approach . ;; to assessing behavior 
potential was an improvement over the technologies developed 
by Hively &&• (1968) and Ferguson (1969)* The domain 
of column subtraction problems was chosen for the compar- 
ison because of the availability in the literature of 
relevant information (i.e. , Hivelv et al*« 1968; Ferguson, 
1969)* 



For the purposes of this study, improvement meant 
one or more of the foil owing: 

a* an improvement in predictions concerning 
the performance of individual subjects on 
particular kinds of test items, 
b* an improvement in the degree of general iza- 
bility (from test items to a clearly 
specif ie<^ domain) , 

c* a reduction in the number of test instances 
required to determine behavior potential, and 
d* an improvement in the hierarchical ordering 

is 



1t> 



of tasks (with its important implications 
for conditional testing). 
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METHOD 

The algorithmic technology was used to construct four algorithms 
for column subtraction. Two algorithms were based on a ’’borrowing" 
procedure for subtraction and consisted of 6 and 5 paths, respectively. 

The other two algorithms were based on an "equal additions" procedure 
and consisted of 4 and 6 paths, respectively. The intersection partition 
with respect to all four algorithms was then constructed (see footnote 
4). It contained 12 equivalence classes. The flow chart of the sub- 
traction algorithm shown in Figure 1 was designed explicitly to have a 
path corresponding to each and every equivalence class in the intersection 
partition. 




Insert Figure 1 about here 

The directed graph, the twelve possible paths, and items from 
corresponding equivalence classes of the subtraction algorithm of Figure 
1 are shown in Figure 2. The numbered arcs in the graph and paths 
correspond to rules in the flow chart and the points to the initial 
(START), terminal (STOP) and branching rules of the flow chart. 



Insert Figure 2 about here 



Hively eh al . (1968) used an item forms analysis of subtraction 
problems to identify 28 subclasses of problems. Of these 28 subclasses, 
the following 22 pertained to column subtraction: 

1. Basic fact; minuend - 10 

2. Subtract 0 *t*y 
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3* Answer » 0 

4. Basic fact; minuend > 10 
5* No borrow; no 0 in answer or problem 

6* No borrow; x-0 fact in problem 

7* No borrow; 0-0 fact in problem 

8. No borrow; x-x fact in problem 

9. No borrow; small; unequal lengths 
10# , No borrow; large; unequal lengths 
11# Simple borrow 

12# Simple borrow; one digit subtrahend 

13# Simple borrow; one digit answer 

14# Simple borrow; medium 

15# Borrow; one digit from large number 

16# Borrow; medium* subtrahend one digit short 

17# Borrow; medium; unequal lengths 

18# Separated borrows 

19 • Repeated borrows » 

20* Borrow across 0 

21. Borro w across two (or more) 0*s 

22* Large numbers 

With the exception of "Large numbers” which was omitted 
from consideration because it included several of the other 
categories (e»g«, "Borrow one digit from large number," 

4 

"Repeated borrows," "Separated borrows," etc#), the item 
forms in the above list were interpreted so as to represent 
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mutually exclusive classes of problems. ^ 

By taking intersections of the 21 item forms with the 12 equivalence 
classes generated by the algorithmic approach, 37 new classes of sub- 
traction problems , shown in Table 1, were obtained. 

Insert Table 1 about here 

Prediction and criterion tests (parallel tests A and B respectively) 
were constructed by generating two arbitrary items for each of the 37 
classes in the intersection set obtained from item forms and equivalence 
classes, one for each test. The order of items was randomized in each 
test. 

Subjects and Procedures . The subjects were 34 ninth grade general mathe- 
matics students attending summer school at Shaw Junior High School in 
Philadelphia. Tests A and B were administered to the subjects in their 
classrooms on consecutive days. The order in which the tests were given 
was counterbalanced over subjects. Of the 34 subjects, 25 were in 
attendance both days and received both tests A and B. 

Analysis of Results . Since Ferguson (1969) in his analysis on- 



iS3 

^There was one ambiguous class of problems (e.g. , ~%l ) which may be 
interpreted as borrow or no borrow depending upon how one considers the 
problem. Also, some of the item forms (i.e. , classes of problems defined 
by the item forms) are properly contained in other item forms. For example, 
’’Borrow; medium; subtrahend one digit short” is properly contained in 
"Borrow; medium; unequal lengths.” In this case, unequal lengths was 
taken to mean that the minuend contained two or more digits more than the 
subtrahend. 

In effect, using mutually exclusive item forms had the effect of 
Improving the *evel of item forms predictions by 1% souths present study 
provides a more conservative comparison as regards the algorithmic approach. 
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ly identified hierarchical forms (see Fig, 3 ) involving 

three or fewer digit numbers, comparison of the • 
assessment procedures was done in two parts (1) for the 
entire domain of column subtraction problems and (2) for 
a restricted domain of subtraction problems, can parable 
to Ferguson f s hierarchical forms. The restricted domain 
consisted of classes of problems (marked by $ in Table 1) 
in the intersection set associated with the first seven 
equivalence classes and the thirteen item forms, 1-9, 

11-13, and 19, pertaining to basic facts and no borrow 
(minus large lengths), simple borrow, and repeated borrow, 
respectively. Parallel tests. A* and B # , were constructed 
for the restricted domain by deleting from tests A and B 
items from those classes of problems not marked by an 
. asterisk. 

In order to compare the item forms and algorithmic 
^approaches on the unrestricted 

domain of subtraction problems, tv?o subtests were con- 
structed for each technology, one from . test 

A and the other from test B. Tnis was done 

for each technology by randomly taking one test item from 
each class of items associated with an item form or 
equivalence class. 

To compare performance on the restricted domain, a 
pair of similar subtests was constructed from the restricted 

tests A* and B * for each technology (algo- 

20 
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rithmic, hierarchical forme, ana item forms)* 

Performance on the unrestricted subtests pro- 

vided the basic data for comparison of the algorithmic 
and item forms technologies for the unrestricted domain 
of subtraction problems# Performance on the restricted 
subtests provided the basic data for comparison of the 
algorithmic, item forms, and hierarchical forms techno- 
logies on the restricted domain of subtraction problems# 
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RESULTS AND DISCUSSION 



Levels of Predictability , Table 2 shows the levels of predictability 
and correlation between items belonging to the same class for each of 
the various types of tests. The top half of Table 2 shows the levels 
of predictability for tests measuring performance on the unrestricted 
domain of subtraction problems , 



Insert Table 2 about here 



In regard to the first criterion (p. 14), the overall levels of 
predictability on individual items were approximately the same for all 
unrestricted tests. However, the correlation between corresponding 
test A and test B items for equivalence classes, .53, was significantly 
greater (p < .05, Edwards, 1966, p. B2) than the correlation, .39, 
between corresponding items for item forms. This correlation for 
equivalence classes was also higher, although not significantly so, 
than that for the intersection of equivalence classes and item form. 
(.49). 

The difference in correlations between equivalence classes and 
item forms was due to the significantly higher (p 4 . .05, Edwards, 1966, 
p. 53) levels of predictability for equivalence classes for those test 
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A items on which subjects were not successful. Furthermore, the level 
of predictability for those test A items on which subjects were not 
successful was also significantly greater (p < .05) for equivalence 
classes than for the intersection of item forms and equivalence classes. 
This latter result must be tempered, however, because the difference in 
levels of predictability between the intersection and equivalence classes 
for those test A items on which subjects were successful was also signi- 
ficant (p < .05). (The corresponding difference between equivalence 
classes and item forms was not significant.) 

In effect, the test constructed on the basis of the algorithmic 
technology with approximately 57% as many items (12 as compared to 21) 
gave better predictions on individual items than the corresponding test 
for item forms. Furthermore, tests formed from the two algorithms based 
on ’'borrowing” (see p. 16) had 65% and 75% levels of prediction where 
subjects were unsuccessful on test A items with overall levels of pre- 
dictability at 78%. These levels of prediction were obtained with only 
6 and 5 items for the respective tests. Hence, with considerably fewer 
items these tests were not only as effective in overall predictability as 
the intersection and item forms tests but also had higher (and for the 5 
item test significantly higher, p < .05) levels of predictability than 
the item forms test for those test A items where subjects were unsuccessful. 

It is also worth noting that of the four algorithms (see p. 16) 
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originally identified, the two based on ‘‘borrowing" had significantly 
higher (p < .05) levels of prediction than the two algorithms based on 
"equal additions" where subjects were unsuccessful on test A items 
(65% and 75% as compared to 29% and 32%) . The implication of this , of 
course, is that for these subjects the tests formed from algorithms 
based on "borrowing" were better predictors than the tests formed from 
algorithms based on "equal additions." This difference between the two 
types of subtraction appears to reflect the fact that "borrowing" is 
the more common procedure taught in American schools. 

The components of variance (Winer, 1962, pp. 184-191) shown in 
Table 3 are also relevant to criterion one (p. 14) . Consider the contri- 
bution of variance due to the interaction of subjects by items within 
classes. Although this source contributed most of the variance for each 
of the three types of test on the unrestricted domain, the contribution 
was lowest for equivalence classes. Furthermore, the sources of variance 
due to classes and subjects by classes were greater for equivalence 
classes than item forms. These results tend to confirm the previous 

t 

finding that even with fewer items, the algorithmic approach was more 
sensitive than the item forms technology in pinpointing strengths and 
weaknesses of individual students. 



Insert Table 3 about here 

The levels of predictability and correlation associated with the 
restricted domain are shown in the lower half of Table 2. None of the 
obtained results was significantly different. Restricting the domain, 
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however, had the effect of increasing overall predictability for each 
technology . Since most of the problems in the restricted domain appeared 
to be relatively easy for the subjects, the levels of predictability for 
"success” items were quite high. The relatively small number of errors 
involved overall suggests that the low levels of predictability for items 
on which subjects were not successful may have been due to careless 
mistakes . 

Components of variance could not be obtained for most of the tests 
in regard to the restricted domain because estimates of variance due to 
items within classes were negative for all restricted tests except item 
forms. In that case, the contribution of variance due to persons by 
items within item forms was 77%. 

Generalizability Results. In regard to the second criterion (p. 14), 
Table 4 shows the coefficients of generalizability and <x‘ s for each 
type of test. The coefficient ** is a lower bound estimate of how well 
one can generalize from a subject’s obtained score on a test to his per- 
formance on the stated domain of items (Cronbach et al . , 1963) , in this 
case column subtraction problems . It is also an intraclass correlation 
coefficient for estimating reliability (Winer, 1962, pp. 124-132). The 
coefficient oi 'a (Rajaratnam, et al. , 1965) is an estimate of generaliza- 
bility for stratified parallel tests, tests for which the domain of items 

% ct' and Of $ are estimates of generalizability from a single test to 
a well-defined domain of items and correspond to Cronbach r s (1951) °* and 
Rajaratnam et al. ’s (1965) , respectively, which are estimates of 

generalizability from the mean of two or more parallel tests (to a well- 
defined domain). 
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is divided into different classes as was the case in this study. 



Insert Table 4 about here 

The top half of Table 4 shows the coefficients of generalizability 
for the unrestricted domain of subtraction problems. Of these, the 
intersection test provided the highest estimates of generalizability ; 
those for equivalence classes were next; and item forms last. Again, 
it is of interest to note that the two subtests formed from "borrowing” 
algorithms had levels of generalizability as high as the subtest formed 
from item forms. For the test with 6 items a* * .75; &'$ « .60, and 
for the test with 5 items ° f/ » .64; °* s * .62. 

On the restricted domain of subtraction problems, the coefficients 
shown in the lower half of Table 4 for the restricted intersection, 
restricted item forms, and restricted equivalence classes were greater 
than the coefficients for hierarchical forms. 

The values of ex ' and obtained for the restricted tests were not 
the same as those obtained for the unrestricted tests (X^ “ 20.6, 6df, 
p <> .01; X 2 ■ 26.19, 6df, p < .01, Edwards, 1966, p. 83). In effect, 
a subject^ score on a restricted test and in particular on the test 
generated by hierarchical forms could not viably be generalized to the 
entire domain of cdLurnn subtraction problems. Hence, although the overall 
levels of predictability for these tests were higher than those generated 
from the unrestricted domain, the above results indicate that this was 
accompanied by a significant loss in generalizability. 
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Efficiency Criterion . The data clearly show that the algorithmic 
approach was more efficient than the item forms technology. Only 12, 
as compared to 21, items were required to achieve about the same 
overall level of predictability and somewhat better levels of generali- 
zability. The increase in efficiency evident with the tests formed 
from the two "borrowing” algorithms is even more striking. With only 6 
and 5 items, respectively, they had essentially the same levels of pre- 
dictability and generalizability as the item forms test with 21 items. 

Furthermore, although it seems reasonable to suppose that the 
intersection test with 37 items would produce the highest levels of 
predictability and generalizability, in general this was not the case. 

With a third (12 as compared to 37) as many items, the algorithmic 
approach maintained as high a level of overall predictability and only 
slightly (nonsignificantly) lower levels of generalizability. The item 
forms test, which had slightly more than half the number of items as the 
intersection test, also obtained as high a level of predictability 
although somewhat lower levels of generalizability. Overall, these 
results lead one to suspect that under the testing conditions used the 
algorithmic approach for assessing mastery approaches asymptote. 

Further improvement would almost necessarily require more rigorous testing 
conditions (cf., Scandura & Dumin in Scandura, 1972). 

Even on the restricted domain the equivalence classes test appeared 
to be the most efficient. Overall levels of predictability were the 
same for all tests, while generalizability coefficients were somewhat 
higher for the equivalence class and item forms tests. These higher levels of 

87 
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generalizability , however, were obtained with half as many items in the 
case of the equivalence classes test. 

Hierarchical Analyses. The fourth criterion (p. 14) is concerned with 
the fact that efficiency may sometimes be increased through the use of 
conditional testing procedures, at least where the various items lend 
t hems elves to Cuttman (1947) type scaling. In the present study, however, 
it must be noted that each of the technologies compared provides an 
explicit basis for ordering items that is independent of empirical data. 

Figures 3, 4 and .5, respectively, show the various hierarchies 
(partial orderings) proposed for hierarchical forms (Ferguson, 1969), item forms 
(Hively et al., 1968), and the algorithm of Figure 1. 

‘"' r m 



Insert Figures 3, 4 and 5 about here 

The method of analysis used to determine the relative validity of 

the three hierarchies was similar to that used by Gagne (1962) to confirm 

relationships between higher and lower levels in task analysis. 

In Table 5, the positive-positive (++) superordinate-subordinate 

relationship shows for each hierarchy the number of cases where uniform 

success on the two superordinate problems associated with a class implied 

uniform success on all problems associated with relatively subordinate 

classes. The (— ) superordinate-subordinate relationship shows the number 

of cases where failure on at least one of the superordinate problems in a 
superordinate 

A class implied failure on at least one of the relatively subordinate 
classes. The (4—) superordinate-subordinate relationship shows the 
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number of cases where success on a superordinate class failed to Indicate 
success on all relatively subordinate classes. The (-+•) superordinate- 
subordinate relationship shows the number of cases where there was uniform 
success on all subordinate classes but not on the relatively superordinate 
class. 



Insert Table 5 about here 

The *H* and — relations, therefore, validate an ordering whereas 
the 4— relation contradicts one. The -f relation is considered neutral. 

The proportion of verifying cases to the number of verifying plus 
contradictory cases was. .82 for the equivalence classes hierarchy as 
compared to .74 for the item forms hierarchy (p K .01). None of the 
differences on the restricted domain were significant. To summarize, 
then, the algorithmic approach not only provided the best and most 
efficient method for assessing behavior potential, but the hierarchy 
Induced by the approach could be used to increase this efficiency even 
more through the use of conditional testing procedures which involve 
branching (with or without computer assistance) • 

Implications . On almost all measures obtained the algorithmic approach 
to assessing behavior potential proved to be either better, or at least 
as good, as the technologies based on item forms or hierarchical analysis. 
Nonetheless, at first thought the item forms technology might appear to 
have a certain advantage over the algorithmic approach. Given an item 
form, it is a routine matter to generate an instance of that item form. 
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This could be particularly useful in computer assisted testing (e.g.. 
Shoemaker and Osbum, 1969; Ferguson, 1969), since the computer could 
be programmed to randomly generate test items within forms. (The item 
forms themselves, however, must be determined directly by the test 
constructor.) 

In the algorithmic approach this would have to be done indirectly. 
Nonetheless, the computer, once given an algorithm, could be programmed 
to automatically trace out the paths, identify the equivalence classes 
of problems, randomly generate test items in the equivalence classes, 
and order the items for testing. That is, the computer should be able 
to generate not only the items but also the item forms (i.e., equivalent 
classes) themselves. 

Moreover, on further reflection, it becomes apparent that the more 
circuitous route required for generating test items via the algorithmic 
approach has a further major advantage. It provides an explicit basis 
for remedial instruction. To see this, we assume in accordance with 
Scandura's (1971a, 1971b, 1972) theory that subjects actually use rules 
(algorithms) to generate their behavior. Then, because each equivalence 
class of items corresponds to a unique path of a rule, and because the 
steps in each such path are known explicitly to the instructor (or 
computer) , each pupil can be given specific instruction to overcome his 
inadequacies. Put succinctly, he can be taught the needed paths. These 
ideas constitute the theoretical basis for a series of self -diagnostic 
and remedial tapes and workbooks developed by the Mathematics Education 
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Research Group (e.g., Scandura , 1970c; Scandura, Gramick & Durnin , 1971) 
and could be extended for use in computer assisted testing and 
instruction. 
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Table 3 

Components of Variance in Item Scores 
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Table 4 
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where is test variance, is item variance 
within a class and S* is class variance. 



40 



Table 5 



Pass (+) -Fail (-) Relationship Between Super ordinate 
Problems and Relatively Subordinate Problems 
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Figure Captions 



Figure 1: Subtraction Algorithm 

Figure 2: Directed graph and paths of subtraction algorithm 

Figure 3: Hierarchical Forms adapted from Ferguson (1969) 

Figure 4: Hypothesized hierarchy for subtraction item forms 
adapted from Hively, Patterson, & Page (1968) 

Figure 5i Hierarchy of Paths based on Subtraction Algorithm 
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Figure 1 : Subtraction Algorithm 



o 

ERIC 



43 



Directed Graph 



Paths 



START 




STOP 




Stimulus Instances from 
Corresponding Equivalence 
Classes 



1 . 

2 . 

3. 

4. 



/ v 2. v 

• ♦ * 






¥ 







• — L -r*r — 






7 

zl 

13 

-6 

258 

-13 



153 

-92 






54 

-27 



1563 

-875 

268 

-97 



1663 

-824 



603 

-578 

4029 

-3642 



1300 

-423 



16059 

-8797 



Figure 2 : Directed graph and paths of subtraction algorithm 
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Figure 3 : Hierarchical Forms adapted from Ferguson (1969) 
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Borrow; unequal 




Repeated 




Borrow across 
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lengths; medium. 




borrows . 
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two 0's. 



Borrow; one 
digit from 
large number. 






Borrow; medium; 
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digit short. 
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No borrow; 
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x-0 fact 




0-0 fact 
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problem. 
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No borrow 
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problem. 
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Basic fact 




Subtract 
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minuend ~ 10. 
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* 0 . I 



Figure 4 : Hypothesized hierarchy for subtraction item forms 

adapted from Hively, Patterson, & Page (1968) 
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Figure 5 : Hierarchy of Paths based on Subtraction Algorithm 
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